Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers

نویسندگان

  • Meelis Kull
  • Telmo de Menezes e Silva Filho
  • Peter A. Flach
چکیده

For optimal decision making under variable class distributions and misclassification costs a classifier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful non-parametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier. In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for Naive Bayes and Adaboost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers – Supplementary material

This material supplements the AISTATS 2017 paper on beta calibration and presents tables and critical difference diagrams for all results obtained in the experimental analysis. In all the tables, best results are marked in bold and subscript numbers indicate the ranks. Due to space limitations, numbers are rounded to three decimal digits, therefore, differences that occur after the third digit ...

متن کامل

Evidential Logistic Regression for Binary SVM Classifier Calibration

The theory of belief functions has been successfully used in many classification tasks. It is especially useful when combining multiple classifiers and when dealing with high uncertainty. Many classification approaches such as k-nearest neighbors, neural network or decision trees have been formulated with belief functions. In this paper, we propose an evidential calibration method that transfor...

متن کامل

Combination and Calibration Methods for Probabilistic Forecasts of Binary Events

Probabilistic forecasts of atmospheric variables are often given as relative frequencies obtained from ensembles of deterministic forecasts. The detrimental effects of imperfect models and initial conditions on the quality of such forecasts can be mitigated by calibration. This paper shows that Bayesian methods currently used to incorporate prior information can be written as special cases of a...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Simultaneous Determination of Sulfamethoxazole and Phthalazine by HPLC and Multivariate Calibration Methods

Two multivariate calibration methods are compared for the simultaneous chromatographic determination and separation of Sulfamethoxazole (SMX) and Phthalazine (PHZ) by High Performance Liquid Chromatography (HPLC). Multivariate calibration techniques such as Classical Least Squares (CLS) and Inverse Least Squares (ILS) were introduced into HPLC to determine the quantification by using UV d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017